16 research outputs found
Using Distributed Representations to Disambiguate Biomedical and Clinical Concepts
In this paper, we report a knowledge-based method for Word Sense
Disambiguation in the domains of biomedical and clinical text. We combine word
representations created on large corpora with a small number of definitions
from the UMLS to create concept representations, which we then compare to
representations of the context of ambiguous terms. Using no relational
information, we obtain comparable performance to previous approaches on the
MSH-WSD dataset, which is a well-known dataset in the biomedical domain.
Additionally, our method is fast and easy to set up and extend to other
domains. Supplementary materials, including source code, can be found at https:
//github.com/clips/yarnComment: 6 pages, 1 figure, presented at the 15th Workshop on Biomedical
Natural Language Processing, Berlin 201
Unsupervised Context-Sensitive Spelling Correction of English and Dutch Clinical Free-Text with Word and Character N-Gram Embeddings
We present an unsupervised context-sensitive spelling correction method for
clinical free-text that uses word and character n-gram embeddings. Our method
generates misspelling replacement candidates and ranks them according to their
semantic fit, by calculating a weighted cosine similarity between the
vectorized representation of a candidate and the misspelling context. To tune
the parameters of this model, we generate self-induced spelling error corpora.
We perform our experiments for two languages. For English, we greatly
outperform off-the-shelf spelling correction tools on a manually annotated
MIMIC-III test set, and counter the frequency bias of a noisy channel model,
showing that neural embeddings can be successfully exploited to improve upon
the state-of-the-art. For Dutch, we also outperform an off-the-shelf spelling
correction tool on manually annotated clinical records from the Antwerp
University Hospital, but can offer no empirical evidence that our method
counters the frequency bias of a noisy channel model in this case as well.
However, both our context-sensitive model and our implementation of the noisy
channel model obtain high scores on the test set, establishing a
state-of-the-art for Dutch clinical spelling correction with the noisy channel
model.Comment: Appears in volume 7 of the CLIN Journal,
http://www.clinjournal.org/biblio/volum
A Short Review of Ethical Challenges in Clinical Natural Language Processing
Clinical NLP has an immense potential in contributing to how clinical
practice will be revolutionized by the advent of large scale processing of
clinical records. However, this potential has remained largely untapped due to
slow progress primarily caused by strict data access policies for researchers.
In this paper, we discuss the concern for privacy and the measures it entails.
We also suggest sources of less sensitive data. Finally, we draw attention to
biases that can compromise the validity of empirical research and lead to
socially harmful applications.Comment: First Workshop on Ethics in Natural Language Processing (EACL'17
Unsupervised patient representations from clinical notes with interpretable classification decisions
We have two main contributions in this work: 1. We explore the usage of a
stacked denoising autoencoder, and a paragraph vector model to learn
task-independent dense patient representations directly from clinical notes. We
evaluate these representations by using them as features in multiple supervised
setups, and compare their performance with those of sparse representations. 2.
To understand and interpret the representations, we explore the best encoded
features within the patient representations obtained from the autoencoder
model. Further, we calculate the significance of the input features of the
trained classifiers when we use these pretrained representations as input.Comment: Accepted poster at NIPS 2017 Workshop on Machine Learning for Health
(https://ml4health.github.io/2017/